An in-depth comparison of keyword specific thresholding and sum-to-one score normalization
نویسندگان
چکیده
The quality of a spoken term detection (STD) system critically depends on the choice of a “thresholding” function, which is used to determine whether to output a candidate detection or not based on its score. In the context of the IARPA Babel program and the NIST OpenKWS evaluation series, the penalty for missing an occurrence depends on the frequency of the keyword, so it is desirable either to apply different thresholds to different keywords, or to normalize the scores before applying a global threshold. This paper compares two widely used thresholding algorithms: keyword specific thresholding (KST) and sum-to-one score normalization (STO), analyzes the difference in their performance in detail, and recommends the use of the “estimated KST” algorithm.
منابع مشابه
بهبود کارایی سیستم کاوشگر کلمات تلفنی با استفاده از نرمالیزاسیون امتیاز اطمینان مبتنی بر روش برنامهریزی خطی
Conventional word spotting systems determine hypothesized keywords and their confidence score using a speech recognizer. Acceptance or rejection of these keywords is intended based on comparison of their scores with a specific threshold. It has been proved that confidence score prepared by recognizer is highly dependent on sub-word structure of each keyword. So comparing assigned scores to keyw...
متن کاملChange detection from satellite images based on optimal asymmetric thresholding the difference image
As a process to detect changes in land cover by using multi-temporal satellite images, change detection is one of the practical subjects in field of remote sensing. Any progress on this issue increase the accuracy of results as well as facilitating and accelerating the analysis of multi-temporal data and reducing the cost of producing geospatial information. In this study, an unsupervised chang...
متن کاملA comparison of multiple methods for rescoring keyword search lists for low resource languages
We review the performance of a new two-stage cascaded machine learning approach for rescoring keyword search output for low resource languages. In the first stage Confusion Networks (CNs) are rescored for improved Automatic Speech Recognition (ASR) by reranking the arcs of each confusion bin. In the second stage we generate keyword search hypotheses from the rescored ASR output and rescore them...
متن کاملLiberating the Biometric Menagerie Through Score Normalization Improvements
by Jeffrey Richard Paone The biometric menagerie, or biometric zoo, is a classification system used to label the matching tendencies of a given subject’s biometric signature. These tendencies may include matching their own signatures poorly or matching other subjects’ signatures better than their own. Several experiments show the biometric menagerie to be an unstable classification system where...
متن کاملKAN and RinSCut: Lazy Linear Classifier and Rank-in-Score Threshold in Similarity-Based Text Categorization
Two important research areas in statistical approaches for automated text categorization are similarity-based learning algorithms and associated thresholding strategies. The combination of these techniques significantly influences the overall performance of text categorization systems. After researching common techniques in both areas, we describe a lazy linear classifier known as the keyword a...
متن کامل